-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add Optional Source Filtering to Source Loaders #113827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Spinoff of elastic#113036. This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @jimczi, I've created a changelog YAML for you. |
.../impl/src/main/java/org/elasticsearch/xcontent/provider/XContentParserConfigurationImpl.java
Outdated
Show resolved
Hide resolved
| protected SourceLoader.SyntheticFieldLoader syntheticFieldLoader(Stream<Mapper> mappers, boolean isFragment) { | ||
| var fields = mappers.sorted(Comparator.comparing(Mapper::fullPath)) | ||
| .map(Mapper::syntheticFieldLoader) | ||
| SourceLoader.SyntheticFieldLoader syntheticFieldLoader(SourceFilter filter, Collection<Mapper> mappers, boolean isFragment) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: keep passing the stream to avoid reconverting the list to a stream?
I also wonder about the overhead here, shall we skip streams altogether and use good-old collections?
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
|
Looks good overall, added Martijn and Sasha to get another look, since this is a core part of synthetic source. Do we have any benchmark results quantifying the impact of this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xcontent changes LGTM.
libs/x-content/src/main/java/org/elasticsearch/xcontent/XContentParserConfiguration.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, synthetic source changes look good to me. I wonder if there is any performance impact when filtering is not used compared to current state.
server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/XContentDataHelper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
|
@jimczi any chance you can get back to submitting this? It should also help with fetching fields from synthetic source in ES|QL, with proper block loader adjustments. |
|
@kkrik-es I have updated the branch and can proceed with merging this PR as is, provided there are no objections. Please let me know if you have any concerns or additional feedback. |
|
I think we've all approved, very excited to see it submitted. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.
This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter. This update also modifies the get document API to apply source filters earlier—directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.
Spinoff of #113036.
This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.
This update also modifies the get document API to apply source filters earlier/directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.